{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "72b1b316",
   "metadata": {},
   "source": [
    "# JSON parser\n",
    "This output parser allows users to specify an arbitrary JSON schema and query LLMs for outputs that conform to that schema.\n",
    "\n",
    "Keep in mind that large language models are leaky abstractions! You'll have to use an LLM with sufficient capacity to generate well-formed JSON. In the OpenAI family, DaVinci can do reliably but Curie's ability already drops off dramatically. \n",
    "\n",
    "You can optionally use Pydantic to declare your data model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "cd33369f",
   "metadata": {},
   "outputs": [],
   "source": [
    "from typing import List\n",
    "\n",
    "from langchain.chat_models import ChatOpenAI\n",
    "from langchain.prompts import PromptTemplate\n",
    "from langchain_core.output_parsers import JsonOutputParser\n",
    "from langchain_core.pydantic_v1 import BaseModel, Field"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "9b4d242f",
   "metadata": {},
   "outputs": [],
   "source": [
    "model = ChatOpenAI(temperature=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "a1090014",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define your desired data structure.\n",
    "class Joke(BaseModel):\n",
    "    setup: str = Field(description=\"question to set up a joke\")\n",
    "    punchline: str = Field(description=\"answer to resolve the joke\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "4ccf45a3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'setup': \"Why don't scientists trust atoms?\",\n",
       " 'punchline': 'Because they make up everything!'}"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# And a query intented to prompt a language model to populate the data structure.\n",
    "joke_query = \"Tell me a joke.\"\n",
    "\n",
    "# Set up a parser + inject instructions into the prompt template.\n",
    "parser = JsonOutputParser(pydantic_object=Joke)\n",
    "\n",
    "prompt = PromptTemplate(\n",
    "    template=\"Answer the user query.\\n{format_instructions}\\n{query}\\n\",\n",
    "    input_variables=[\"query\"],\n",
    "    partial_variables={\"format_instructions\": parser.get_format_instructions()},\n",
    ")\n",
    "\n",
    "chain = prompt | model | parser\n",
    "\n",
    "chain.invoke({\"query\": joke_query})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "37d801be",
   "metadata": {},
   "source": [
    "## Streaming\n",
    "\n",
    "This output parser supports streaming."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "0309256d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'setup': ''}\n",
      "{'setup': 'Why'}\n",
      "{'setup': 'Why don'}\n",
      "{'setup': \"Why don't\"}\n",
      "{'setup': \"Why don't scientists\"}\n",
      "{'setup': \"Why don't scientists trust\"}\n",
      "{'setup': \"Why don't scientists trust atoms\"}\n",
      "{'setup': \"Why don't scientists trust atoms?\", 'punchline': ''}\n",
      "{'setup': \"Why don't scientists trust atoms?\", 'punchline': 'Because'}\n",
      "{'setup': \"Why don't scientists trust atoms?\", 'punchline': 'Because they'}\n",
      "{'setup': \"Why don't scientists trust atoms?\", 'punchline': 'Because they make'}\n",
      "{'setup': \"Why don't scientists trust atoms?\", 'punchline': 'Because they make up'}\n",
      "{'setup': \"Why don't scientists trust atoms?\", 'punchline': 'Because they make up everything'}\n",
      "{'setup': \"Why don't scientists trust atoms?\", 'punchline': 'Because they make up everything!'}\n"
     ]
    }
   ],
   "source": [
    "for s in chain.stream({\"query\": joke_query}):\n",
    "    print(s)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "344bd968",
   "metadata": {},
   "source": [
    "## Without Pydantic\n",
    "\n",
    "You can also use this without Pydantic. This will prompt it return JSON, but doesn't provide specific about what the schema should be."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "dd3806d1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'joke': \"Why don't scientists trust atoms? Because they make up everything!\"}"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "joke_query = \"Tell me a joke.\"\n",
    "\n",
    "parser = JsonOutputParser()\n",
    "\n",
    "prompt = PromptTemplate(\n",
    "    template=\"Answer the user query.\\n{format_instructions}\\n{query}\\n\",\n",
    "    input_variables=[\"query\"],\n",
    "    partial_variables={\"format_instructions\": parser.get_format_instructions()},\n",
    ")\n",
    "\n",
    "chain = prompt | model | parser\n",
    "\n",
    "chain.invoke({\"query\": joke_query})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a4d12261",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
